\section{Developing Libraries with HOL4}\label{sec:libraries}

In the previous section we have looked at a first, self-contained, example of a
HOL4 proof.
To prove the closed form of the gaussian sum, we only needed to define a function
and perform a straight-forward proof by induction.
However, in larger developments it is common to split proofs into smaller
lemmas that are used as part of a central, final theorem\footnote{This is similar to how one would never write larger programs within the \texttt{main} function. It is desirable to split up functionality into programs instead}.

In this section we describe how larger developments are performed with HOL4 by
showing the theorem that there is an infinite number of prime numbers, called
euclid's theorem.

\subsection{Preamble}
Before starting a development in HOL4, it is recommended to declare dependencies
and load theorems that come with the HOL4 theorem prover.
We do so by running

\begin{lstlisting}
open BasicProvers Defn HolKernel Parse Conv SatisfySimps Tactic monadsyntax
     boolTheory bossLib arithmeticTheory;

open LassieLib arithTacticsLib;

val _ = new_theory "euclid";
\end{lstlisting}

The first \lstinline{open} loads a bunch of theories and tactics from the HOL4
git repository, whereas the second loads Lassie and the natural language
descriptions required for the proofs that we will perform.
The third line, tells HOL4 that we start a new theory, called ``euclid''.
In HOL4 speak, this is the analogous to defining an interface in Java, or a
header file in C.
All definitions and theorems are explicitly part of the interface of a theory.
To prevent errors, the file in which we store the theory has to be called
\lstinline{euclidScript.sml}.
If this correspondance between the file name and theory name is ignored,
HOL4 will fail to build the theory later.
As in the previous section, we recommend implementing file \lstinline{euclidScript.sml}
in the directory \texttt{\$LASSIEDIR/examples}.
We load the jargon with \lstinline{val _ = LassieLib.loadJargon "Arithmetic";}.

\subsection{Basic Definitions}
Our overall goal is to prove the HOL4 equivalent of the informal statement that
``there is an infinite number of prime numbers''.
The first concept that we need to define is thus what it means for a (natural)
number to be a prime number.

A number $n$ is called a prime number if it is only divisible by $1$ and $n$
itself. Thus, we first define a predicate \lstinline{divides} where
\lstinline{a divides b} if and only if \lstinline{b} can be expressed as a
multiple of \lstinline{a}:

\begin{lstlisting}
set_fixity "divides" (Infix(NONASSOC, 450));

Definition divides_def:
  (a divides b) = (? x. b = a * x)
End
\end{lstlisting}
The first line declares \lstinline{divides} as a new infix operator, like $+,-, \ldots$
Next, the \lstinline{Definition}, \lstinline{End} block defines \lstinline{divides} as a
binary infix relation where \lstinline{a divides b} is true, if and only if
there is an \lstinline{x} such that \lstinline{b} is the result of multiplying \lstinline{a} with \lstinline{x}.

Alternatively, we could have defined \lstinline{divides} as a function instead
of an infix operation, as we did in \autoref{sec:hol_ex1}.
Using it as an infix operation however makes it more obvious which number is
divided by which.

Having defined \lstinline{divides} we use it to define a predicate
\lstinline{prime} which is true, if its argument is a prime number.
\begin{lstlisting}
Definition prime_def:
  prime p = (p<>1 /\ !x . x divides p ==> (x=1) \/ (x=p))
End
\end{lstlisting}

The left-hand side of the conjunction (\lstinline{p <> 1}) explicitly excludes
number 1 from being a prime number, and the right-hand side states the HOL4
version of \lstinline{p} being prime if it can only be divided by $1$ and itself.

\subsection{Proving Infrastructural Lemmas}

Before proving euclid's theorem itself, we start by proving some infrastructural
lemmas that will come in handy later.
\begin{lstlisting}

Theorem DIVIDES_0:
  ! x . x divides 0
Proof
  nltac `[divides_def, MULT_CLAUSES] solves the goal.`
QED

Theorem DIVIDES_ZERO:
  ! x . (0 divides x) = (x = 0)
Proof
  nltac `[divides_def, MULT_CLAUSES] solves the goal.`
QED

Theorem DIVIDES_ONE:
  ! x . (x divides 1) = (x = 1)
Proof
  nltac `[divides_def, MULT_CLAUSES, MULT_EQ_1] solves the goal.`
QED

Theorem DIVIDES_REFL:
  ! x . x divides x
Proof
  nltac `[divides_def, MULT_CLAUSES] solves the goal.`
QED

Theorem DIVIDES_TRANS:
  ! a b c . a divides b /\ b divides c ==> a divides c
Proof
  nltac `[divides_def, MULT_ASSOC] solves the goal.`
QED
\end{lstlisting}

Theorems \lstinline{DIVIDES_0, DIVIDES_ZERO}, and \lstinline{DIVIDES_ONE} show
simple base cases for predicate \lstinline{divides}.
As these follow straight-forwardly from the definition, the Lassie proof is just
\lstinline{nltac `[divides_def, MULT_CLAUSES] solves the goal.`}, resp.
\lstinline{nltac `[divides_def, MULT_CLAUSES, MULT_EQ_-1] solves the goal.`}.

Similarly, one proves theorems about the relation between $+, *$ and $\leq$ and
\lstinline{divides}:
\begin{lstlisting}

Theorem DIVIDES_ADD:
  ! d a b . d divides a /\ d divides b ==> d divides (a + b)
Proof
  nltac `[divides_def, LEFT_ADD_DISTRIB] solves the goal.`
QED

Theorem DIVIDES_SUB:
  !d a b . d divides a /\ d divides b ==> d divides (a - b)
Proof
  nltac `[divides_def, LEFT_SUB_DISTRIB] solves the goal.`
QED

Theorem DIVIDES_ADDL:
  !d a b . d divides a /\ d divides (a + b) ==> d divides b
Proof
  nltac `[ADD_SUB, ADD_SYM, DIVIDES_SUB] solves the goal.`
QED

Theorem DIVIDES_LMUL:
  !d a x . d divides a ==> d divides (x * a)
Proof
  nltac `[divides_def, MULT_ASSOC, MULT_SYM] solves the goal.`
QED

Theorem DIVIDES_RMUL:
  !d a x . d divides a ==> d divides (a * x)
Proof
  nltac `[MULT_SYM,DIVIDES_LMUL] solves the goal.`
QED

Theorem DIVIDES_LE:
  !m n . m divides n ==> m <= n \/ (n = 0)
Proof
  nltac `rewrite [divides_def]. [] solves the goal.`
QED
\end{lstlisting}

\subsection{Euclid's Theorem}
Having defined prime numbers, and after proving simple properties of
\lstinline{divides}, we next state euclid's theorem and start
exploring its proof.

\begin{lstlisting}
Theorem euclid:
  !n . ?p . n < p /\ prime p
Proof

QED
\end{lstlisting}

After starting the interactive proof with \ekey{M-h g}, we can start exploring
it with Lassie.
The textbook version of the proof is done by contradiction, so we perform the
same step in HOL4: \lstinline{nltac `suppose not.`}.
After running the tactic with \ekey{M-h e} the REPL shows the following goal state:
%
\begin{lstlisting}[frame=single, mathescape=true]
> OK..
1 subgoal:
val it =

    0.  $\exists$ n. $\forall$ p. n < p $\rightarrow$ $\neg$prime p
   ------------------------------------
        F

   : proof
\end{lstlisting}

The first assumption starts with an existential quantifier, from which we can
obtain the witness. Therefore we next call into the simplifier to automatically
take care of this with \lstinline{nltac `simplify.`} leaving us with
%
\begin{lstlisting}[frame=single, mathescape=true]
> > > > > # # OK..
1 subgoal:
val it =

    0.  $\forall$ p. n < p $\rightarrow$ $\neg$prime p
   ------------------------------------
        F

   : proof
\end{lstlisting}

The assumption now tells us that any natural number $p$ which is greater than
$n$ cannot be prime.
On a high level, the goal is to derive a contradiction from this assumption by
finding a prime number that is bigger than $n$.
An integral part of this step is that every natural number greater than $1$ has
a prime factorization\footnote{We have to exclude $1$ here because our definition of prime numbers explicitly ruled out $1$.}.
Before continuing the proof, we prove a theorem that for an arbitrary natural
number $n$, there is a prime factor of that number.
To this end we first drop the current goal with \ekey{M-h d} and start proving:
%
\begin{lstlisting}
Theorem PRIME_FACTOR:
  !n . n <> 1 ==> ?p . prime p /\ p divides n
Proof
  LassieLib.nltac `
    Complete Induction on 'n'.
    rewrite [].
    perform a case split for 'prime n'.
    Goal 1. follows from [DIVIDES_REFL]. End.
    Goal 1.
      show '? x. x divides n and x <> 1 and x <> n' using (follows from [prime_def]).
      follows from [LESS_OR_EQ, PRIME_2, DIVIDES_LE, DIVIDES_TRANS, DIVIDES_0].
    End.`
\end{lstlisting}

The theorem can be immediately loaded by running \ekey{M-h M-r} over the complete
text.
However, we recommend stepping through the \lstinline{nltac} steps one-by-one
with \ekey{M-h e} after starting an interactive proof with \ekey{M-h g}.

One particular thing to note here is that this is the first proof that
explicitly mentions subgoals in Lassie.
After performing a case split on whether $n$ is prime or not
(\lstinline{perform a case split for 'prime n'.}) HOL4 leaves us with two
subgoals to prove:

\begin{lstlisting}[frame=single]
> OK..
2 subgoals:
val it =

    0.  $\forall$ m. m < n $\rightarrow$ m $\neq$ 1 $\rightarrow$ $\exists$ p. prime p $\wedge$ p divides m
    1.  n $\neq$ 1
    2.  $\neg$ prime n
   ------------------------------------
        $\exists$ p. prime p $\wedge$ p divides n

    0.  $\forall$ m. m < n $\rightarrow$ m $\neq$ 1 $\rightarrow$ $\exists$ p. prime p $\wedge$ p divides m
    1.  n $\neq$ 1
    2.  prime n
   ------------------------------------
        $\exists$ p. prime p $\wedge$ p divides n

2 subgoals
   : proof
\end{lstlisting}

Here, the first subgoal ($\exists \texttt{p}. \texttt{prime p} \wedge \texttt{p divides n}$)
is solved first, with the natural language command \lstinline{Goal 1}.
Alternatively, we can say that we want to prove first the goal where \texttt{prime n} holds with
\lstinline{Goal 'prime n'}.

Having shown the theorem \lstinline{PRIME_FACTOR} we can go back to proving
Euclid's theorem:
\begin{lstlisting}
Theorem euclid:
  !n . ?p . n < p /\ prime p
Proof
  nltac `suppose not. simplify.`
QED
\end{lstlisting}

As a next step, we will obtain a prime factor of $n! + 1$, called $q$.
From this we know that $q$ is prime and \lstinline{q divides FACT n + 1}.
As $q$ is a prime number, we can derive that $q \leq n$.
We obtain a contradiction by deriving that $q = 1$ from the fact that any number
smaller than $n$ is a divisor of $!n$, and theorem \lstinline{DIVIDES_ADDL}.
The full proofscript becomes:
\begin{lstlisting}[mathescape=true]
nltac `
  suppose not. simplify.
  we can derive 'FACT n + 1 <> 1' from [FACT_LESS, neq_zero].
  thus PRIME_FACTOR for 'FACT n + 1'.
  we further know '?q. prime q and q divides (FACT n + 1)'.
  show 'q <= n' using [NOT_LESS_EQUAL].
  show '0 < q' using [PRIME_POS] .
  show 'q divides FACT n' using [DIVIDES_FACT].
  show 'q=1' using [DIVIDES_ADDL, DIVIDES_ONE].
  show 'prime 1' using (simplify).
  [NOT_PRIME_1] solves the goal.`
\end{lstlisting}

We recommend stepping through each step one by one and observing the changes to
the proof.

After finishing the proof the theory development is ended by putting
\lstinline{val _ = export_theory();} at the end of the file.
This directive tells HOL4 to export the defined functions and proven theorems
into a file \texttt{euclidTheory.sml} which can be used by future developments.
The file \texttt{euclidTheory.sig} gives an overview of the included theorems and
definitions in a human readable format.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "lassie-tutorial"
%%% End:
